Serial No.: 10/065,856 

Amendments to the Specification; 



Please add disclosure following paragraph [0120] with the following paragraphs: 

One such example as described in Provisional Application Serial No. 60/398,958, 
and attorney docket No. 3508.1 (Provisional Application Serial No. 60/422,220) each 
incorporated by reference above includes a process of determining relative transcript 
concentrations from hybridization intensity data from experiments with probe arrays and 
gene structure information using model fitting. 

In the presently described example, a transcript may include several gene features 
which refer to the sequences extracted from different splicing variants of the genes. A 
gene feature can be either exon, intron, or junction (exon-exon junction, exon-intron 
junction, intron-exon junction). An exon feature can be partitioned further depending on 
whether the exon is cassette exon or exons overlapping with others. Probes targeted to 
these features could be mapped to each of the transcript. 

Gene structure includes all the transcripts of each gene and feature composition 
for each transcript. For example, a gene could have two transcripts A and B. Transcript 
A included 3 of the 5 features while transcript B had 4 features. The relationship 
between features and transcripts could be represented by matrix with values of Is or Os 
described as follows: 

Let G be an m by n matrix, where m is the number of transcripts while n 

represents the number of features for a gene. The column F^'^ denotes feature /, and 
Tj f^ denotes k^^ transcript, it is also used to denote the concentration of transcript in 
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experiment;, g^j is the element of this matrix for transcript and feature, its value is 
either 1 or 0. 

The matrix could be written using the following equation, where value Xj'^ 
denotes the concentration measured by feature in experiment/ Xj'^ could be written 
as: 

m 

Equation (1) therefore represents the gene structure. 

Continuing with the present example, to model the data multiple probes may be 
employed to represent each feature. In the present example, these probes typically have 
different properties, however they measure the same concentration of a given transcript 
feature. 

A simple model may be adopted to express the relationship between probes 
properties, concentrations and intensity measurements: 

y,j=a.Xj+£,j (2) 

y^=a,xj-^b.-^e^ (3) 

In the above equations, a. represents the affinity term for probe (which is 
arbitrarily assigned), Z?. represents the background index for i^^ probe. These terms are 
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probe-dependent. Also, jc^ represents the relative concentration of the feature in/** 
experiment, and 8-j denotes the error term. The error term included all factors not 
explained by the other terms, usually it is assumed to be normal distribution with mean 0 
and variance . Formally, this could be written as - A^(0, a^) . 

The above formulas could be rewritten as follows for the f(k)th feature of a given 

gene: 

3;,f>=a/'*>xf«+f, (4) 
yf/'=af"''xj"''+bf'''+€y (5) 
Combining these equations with equation (1), we have for feature of a gene: 

m 

yr=^ri:B^.n^;r,,^s, (6) 

Differences between the predicated and observed intensity for each probe is thus 
minimized. A loss function may be required for penalizing errors in predication. Many 
types of loss functions may be used for the same purpose, such as squared error loss 
function, absolute difference loss function. In the present example, the squared error loss 
function is applied to the model. 

To minimize the squared difference between predicated and observed intensity 
value for all the probes of each gene (a set of features), the equations could be written as: 
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f(A,T) = Ezz(>'r -^x' =zii(3'r -ar(L8..fM-brf o) 

it=l j=l k=l j=l 1=1 /:=! 

To minimize f{AJ) , some constraints or penalty terms are needed in order to 
solve the equations. The following constraints may be added: 

np 

(10) j;(af^)'=c^?n5mnr 
(=1 

(11) a;'^>0 

(12) r/'/>o 

Alternatively, the following penalty terms could be added to equations (7) and (8), 

nf np 

Maximum likelihood estimation is used. The solution may obtained by iteratively 
solving different sets of the parameters until convergence, yielding the relative 
concentration of each variant and the relative affinity term of each probe. 
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